49 research outputs found
The Structure Transfer Machine Theory and Applications
Representation learning is a fundamental but challenging problem, especially
when the distribution of data is unknown. We propose a new representation
learning method, termed Structure Transfer Machine (STM), which enables feature
learning process to converge at the representation expectation in a
probabilistic way. We theoretically show that such an expected value of the
representation (mean) is achievable if the manifold structure can be
transferred from the data space to the feature space. The resulting structure
regularization term, named manifold loss, is incorporated into the loss
function of the typical deep learning pipeline. The STM architecture is
constructed to enforce the learned deep representation to satisfy the intrinsic
manifold structure from the data, which results in robust features that suit
various application scenarios, such as digit recognition, image classification
and object tracking. Compared to state-of-the-art CNN architectures, we achieve
the better results on several commonly used benchmarks\footnote{The source code
is available. https://github.com/stmstmstm/stm }
Episodic Multi-Task Learning with Heterogeneous Neural Processes
This paper focuses on the data-insufficiency problem in multi-task learning
within an episodic training setup. Specifically, we explore the potential of
heterogeneous information across tasks and meta-knowledge among episodes to
effectively tackle each task with limited data. Existing meta-learning methods
often fail to take advantage of crucial heterogeneous information in a single
episode, while multi-task learning models neglect reusing experience from
earlier episodes. To address the problem of insufficient data, we develop
Heterogeneous Neural Processes (HNPs) for the episodic multi-task setup. Within
the framework of hierarchical Bayes, HNPs effectively capitalize on prior
experiences as meta-knowledge and capture task-relatedness among heterogeneous
tasks, mitigating data-insufficiency. Meanwhile, transformer-structured
inference modules are designed to enable efficient inferences toward
meta-knowledge and task-relatedness. In this way, HNPs can learn more powerful
functional priors for adapting to novel heterogeneous tasks in each meta-test
episode. Experimental results show the superior performance of the proposed
HNPs over typical baselines, and ablation studies verify the effectiveness of
the designed inference modules.Comment: 28 pages, spotlight of NeurIPS 202
Solar Flare Intensity Prediction With Machine Learning Models
We develop a mixed long short‐term memory (LSTM) regression model to predict the maximum solar flare intensity within a 24‐hr time window 0–24, 6–30, 12–36, and 24–48 hr ahead of time using 6, 12, 24, and 48 hr of data (predictors) for each Helioseismic and Magnetic Imager (HMI) Active Region Patch (HARP). The model makes use of (1) the Space‐Weather HMI Active Region Patch (SHARP) parameters as predictors and (2) the exact flare intensities instead of class labels recorded in the Geostationary Operational Environmental Satellites (GOES) data set, which serves as the source of the response variables. Compared to solar flare classification, the model offers us more detailed information about the exact maximum flux level, that is, intensity, for each occurrence of a flare. We also consider classification models built on top of the regression model and obtain better results in solar flare classifications as compared to Chen et al. (2019, https://doi.org/10.1029/2019SW002214). Our results suggest that the most efficient time period for predicting the solar activity is within 24 hr before the prediction time using the SHARP parameters and the LSTM model.Key PointsWe develop deep learning models to predict solar flare intensity values instead of flare classes from SHARP parameters in SDO/HMI data set directlyWe use time‐series information from both flaring time and nonflaring time in our modelAs opposed to solar flare classification, directly predicting solar flare intensity gives more detailed information about every occurrence of flares of each classPeer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/156246/2/swe21001_am.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/156246/1/swe21001.pd
The structure transfer machine theory and applications
Representation learning is a fundamental but challenging problem, especially when the distribution of data is unknown. In this paper, we propose a new representation learning method, named Structure Transfer Machine (STM), which enables feature learning process to converge at the representation expectation in a probabilistic way. We theoretically show that such an expected value of the representation (mean) is achievable if the manifold structure can be transferred from the data space to the feature space. The resulting structure regularization term, named manifold loss, is incorporated into the loss function of the typical deep learning pipeline. The STM architecture is constructed to enforce the learned deep representation to satisfy the intrinsic manifold structure from the data, which results in robust features that suit various application scenarios, such as digit recognition, image classification and object tracking. Compared with state-of-the-art CNN architectures, we achieve better results on several commonly used public benchmarks
Identifying Solar Flare Precursors Using Time Series of SDO/HMI Images and SHARP Parameters
We present several methods towards construction of precursors, which show
great promise towards early predictions, of solar flare events in this paper. A
data pre-processing pipeline is built to extract useful data from multiple
sources, Geostationary Operational Environmental Satellites (GOES) and Solar
Dynamics Observatory (SDO)/Helioseismic and Magnetic Imager (HMI), to prepare
inputs for machine learning algorithms. Two classification models are
presented: classification of flares from quiet times for active regions and
classification of strong versus weak flare events. We adopt deep learning
algorithms to capture both the spatial and temporal information from HMI
magnetogram data. Effective feature extraction and feature selection with raw
magnetogram data using deep learning and statistical algorithms enable us to
train classification models to achieve almost as good performance as using
active region parameters provided in HMI/Space-Weather HMI-Active Region Patch
(SHARP) data files. Case studies show a significant increase in the prediction
score around 20 hours before strong solar flare events
Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models
Pre-trained vision-language models, e.g., CLIP, working with manually
designed prompts have demonstrated great capacity of transfer learning.
Recently, learnable prompts achieve state-of-the-art performance, which however
are prone to overfit to seen classes, failing to generalize to unseen classes.
In this paper, we propose a Knowledge-Aware Prompt Tuning (KAPT) framework for
vision-language models. Our approach takes inspiration from human intelligence
in which external knowledge is usually incorporated into recognizing novel
categories of objects. Specifically, we design two complementary types of
knowledge-aware prompts for the text encoder to leverage the distinctive
characteristics of category-related external knowledge. The discrete prompt
extracts the key information from descriptions of an object category, and the
learned continuous prompt captures overall contexts. We further design an
adaptation head for the visual encoder to aggregate salient attentive visual
cues, which establishes discriminative and task-aware visual representations.
We conduct extensive experiments on 11 widely-used benchmark datasets and the
results verify the effectiveness in few-shot image classification, especially
in generalizing to unseen categories. Compared with the state-of-the-art CoCoOp
method, KAPT exhibits favorable performance and achieves an absolute gain of
3.22% on new classes and 2.57% in terms of harmonic mean.Comment: Accepted by ICCV 202
Predicting Solar Flares Using CNN and LSTM on Two Solar Cycles of Active Region Data
We consider the flare prediction problem that distinguishes flare-imminent
active regions that produce an M- or X-class flare in the future 24 hours, from
quiet active regions that do not produce any flare within hours. Using
line-of-sight magnetograms and parameters of active regions in two data
products covering Solar Cycle 23 and 24, we train and evaluate two deep
learning algorithms -- CNN and LSTM -- and their stacking ensembles. The
decisions of CNN are explained using visual attribution methods. We have the
following three main findings. (1) LSTM trained on data from two solar cycles
achieves significantly higher True Skill Scores (TSS) than that trained on data
from a single solar cycle with a confidence level of at least 0.95. (2) On data
from Solar Cycle 23, a stacking ensemble that combines predictions from LSTM
and CNN using the TSS criterion achieves significantly higher TSS than the
"select-best" strategy with a confidence level of at least 0.95. (3) A visual
attribution method called Integrated Gradients is able to attribute the CNN's
predictions of flares to the emerging magnetic flux in the active region. It
also reveals a limitation of CNN as a flare prediction method using
line-of-sight magnetograms: it treats the polarity artifact of line-of-sight
magnetograms as positive evidence of flares.Comment: 31 pages, 16 figures, accepted in the Ap
Attentional Prototype Inference for Few-Shot Segmentation
This paper aims to address few-shot segmentation. While existing
prototype-based methods have achieved considerable success, they suffer from
uncertainty and ambiguity caused by limited labeled examples. In this work, we
propose attentional prototype inference (API), a probabilistic latent variable
framework for few-shot segmentation. We define a global latent variable to
represent the prototype of each object category, which we model as a
probabilistic distribution. The probabilistic modeling of the prototype
enhances the model's generalization ability by handling the inherent
uncertainty caused by limited data and intra-class variations of objects. To
further enhance the model, we introduce a local latent variable to represent
the attention map of each query image, which enables the model to attend to
foreground objects while suppressing the background. The optimization of the
proposed model is formulated as a variational Bayesian inference problem, which
is established by amortized inference networks. We conduct extensive
experiments on four benchmarks, where our proposal obtains at least competitive
and often better performance than state-of-the-art prototype-based methods. We
also provide comprehensive analyses and ablation studies to gain insight into
the effectiveness of our method for few-shot segmentation.Comment: Pattern Recognition Journa